Extracting an AV speech source f
نویسندگان
چکیده
We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the face speaker, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the audio-visual coherence. Then, separation can be achieved by maximising this joint probability. Experiments on additive mixtures of 2, 3 and 5 sources show that the algorithm performs well, and systematically better than the classical BSS algorithm JADE.
منابع مشابه
Title of Dissertation : CORTICAL DYNAMICS OF AUDITORY - VISUAL SPEECH : A FORWARD MODEL OF MULTISENSORY INTEGRATION
Title of Dissertation: CORTICAL DYNAMICS OF AUDITORYVISUAL SPEECH: A FORWARD MODEL OF MULTISENSORY INTEGRATION. Virginie van Wassenhove, Ph.D., 2004 Dissertation Directed By: David Poeppel, Ph.D., Department of Linguistics, Department of Biology, Neuroscience and Cognitive Science Program In noisy settings, seeing the interlocutor’s face helps to disambiguate what is being said. For this to hap...
متن کاملAdaptive Estimation of Time-varying F Speech Based on an Excitat
This paper describes a method of extracting time-varying features that is effective for speech signals with high fundamental frequencies. The proposed method adopts a speech production model that consists of a Time-Varying AutoRegressive (TVAR) process for an articulatory filter and a Hidden Markov Model (HMM) for an excitation source. The model represents waveform amplitude variations by timev...
متن کاملAudio-visual speech fragment decoding
This paper presents a robust speech recognition technique called audio-visual speech fragment decoding (AV-SFD), in which the visual signal is exploited both as a cue for source separation and as a carrier of phonetic information. The model builds on the existing audio-only SFD technique which, based on the auditory scene analysis account of perceptual organisation, works by combining a bottom-...
متن کاملOn timing in time-frequency analysis of speech signals
The objective of this paper is to demonstrate the importance of position of the analysis time window in time-frequency analysis of speech signals. Speech signals contain information about the time varying characteristics of the excitation source and the vocal tract system. Resolution in both the temporal and spectral domains is essential for extracting the source and system characteristics from...
متن کاملNoise alters beta-band activity in superior temporal cortex during audiovisual speech processing
Speech recognition is improved when complementary visual information is available, especially under noisy acoustic conditions. Functional neuroimaging studies have suggested that the superior temporal sulcus (STS) plays an important role for this improvement. The spectrotemporal dynamics underlying audiovisual speech processing in the STS, and how these dynamics are affected by auditory noise, ...
متن کامل